13 research outputs found

    Auditory models for evaluating algorithms

    Get PDF
    Hearing aids are tasked with the undesirable job of compensating an impaired, highly-nonlinear auditory system. Historically, these devices have either employed linear processing or relatively unsophisticated, nonlinear processing techniques. With increasingly more accurate models of the auditory system, expanding computational power, and many more objective measures which utilize these models, we are at a turning point in hearing aid design. Although subjective listener tests are often the most accepted methods for evaluating the quality and intelligibility of speech, they inherently treat the auditory system as a "black box." Conversely, model-based objective measures typically treat the auditory system as a cascade of physical processes. As a result, objective measures have the potential to provide more detailed information about how sound is processed and about where and why quality or intelligibility breaks down. Provided that we can generalize model-based objective measures, we can use the measures as tools for understanding how to best process degraded signals, and therefore, how to best design hearing aids. However, generalizability is a key requirement. Since many of the well-known objective measures have been developed for normal-hearing listeners in the context of audio codecs, we are unsure about the generalizability of these measures to predicting quality and intelligibility for hearing-impaired listeners with "unknown" datasets (i.e. a set on which it was not trained) and distortions which are specific to hearing aids. Relatively recently, however, Kates and Arehart (Journal of the Audio Engineering Society, 2010) proposed the Hearing Aid Speech Quality Index (HASQI), which is a model-based objective measure that predicts quality for normal-hearing and hearing-impaired listeners by taking into account many of the distortions which hearing aids introduce. HASQI solves many of our concerns of generalizability for predicting quality, but it still remains to test HASQI's ability to predict quality with datasets on which it was not trained. Thus, we explore the robustness of HASQI by testing its ability to predict quality for "unknown" de-noised speech, and we directly compare its performance to some other metrics in the literature.M.S.Committee Chair: Rozell, Christopher; Committee Member: Anderson, David; Committee Member: Clements, Mar

    Structure in time-frequency binary masking

    Get PDF
    Understanding speech in noisy environments is a challenge for normal-hearing and impaired-hearing listeners alike. However, it has been shown that speech intelligibility can be improved in these situations using a strategy called the ideal binary mask. Because this approach requires knowledge of the speech and noise signals separately though, it is ill-suited for practical applications. To address this, many algorithms are being designed to approximate the ideal binary mask strategy. Inevitably though, these algorithms make errors, and the implications of these errors are not well-understood. The main contributions of this thesis are to introduce a new framework for investigating binary masking algorithms and to present listener studies that use this framework to illustrate how certain types of algorithm errors can affect speech recognition outcomes with both normal-hearing listeners and cochlear implant recipients.Ph.D

    Effect of Noise Reduction Gain Errors on Simulated Cochlear Implant Speech Intelligibility

    Get PDF
    It has been suggested that the most important factor for obtaining high speech intelligibility in noise with cochlear implant (CI) recipients is to preserve the low-frequency amplitude modulations of speech across time and frequency by, for example, minimizing the amount of noise in the gaps between speech segments. In contrast, it has also been argued that the transient parts of the speech signal, such as speech onsets, provide the most important information for speech intelligibility. The present study investigated the relative impact of these two factors on the potential benefit of noise reduction for CI recipients by systematically introducing noise estimation errors within speech segments, speech gaps, and the transitions between them. The introduction of these noise estimation errors directly induces errors in the noise reduction gains within each of these regions. Speech intelligibility in both stationary and modulated noise was then measured using a CI simulation tested on normal-hearing listeners. The results suggest that minimizing noise in the speech gaps can improve intelligibility, at least in modulated noise. However, significantly larger improvements were obtained when both the noise in the gaps was minimized and the speech transients were preserved. These results imply that the ability to identify the boundaries between speech segments and speech gaps may be one of the most important factors for a noise reduction algorithm because knowing the boundaries makes it possible to minimize the noise in the gaps as well as enhance the low-frequency amplitude modulations of the speech

    Robustness of the Hearing Aid Speech Quality Index (HASQI

    No full text
    Objective measures of speech quality have been the subject of significant prior work, particularly in the areas of speech codecs and communication channels for normal-hearing listeners. One of the primary concerns of researchers in this area is how these metrics generalize to datasets or listener studies which are “unknown ” to the measures. Another growing concern is how these metrics perform for the hearing-impaired community. Researchers working with the this community need to be able to predict how hearing-impaired listeners will perceive the quality of speech, as well as how they will perceive the quality of speech processed specifically by hearing aids. A relatively recent metric, the Hearing Aid Speech Quality Index (HASQI), is a model-based objective measure of quality developed in the context of hearing aids for normal-hearing and hearing-impaired listeners (Kates & Arehart, Journal of the Audio Engineering Society, 2010). As such, HASQI makes substantial progress on some of the generalization issues. However, HASQI has not been tested thus far on any datasets other than the one on which it was trained. The objective of this study is to demonstrate the robustness of HASQI in predicting subjective quality. We use an “unknown ” dataset of noisy speech processed by noise suppression algorithms, along with a corresponding set of subjective quality scores from normal-hearing listeners, to demonstrate HASQI’s prediction performance. Furthermore, we compare HASQI’s performance with that of several other objective measures in order to provide a point of reference. Index Terms — Hearing Aid Speech Quality Index (HASQI), objective measure, speech quality assessment 1

    Evaluating the Generalization of the Hearing Aid Speech Quality Index (HASQI)

    No full text

    Cochlear implant speech intelligibility outcomes with structured and unstructured binary mask errors

    No full text
    It has been shown that intelligibility can be improved for cochlear implant (CI) recipients with the ideal binary mask (IBM). In realistic scenarios where prior information is unavailable, however, the IBM must be estimated, and these estimations will inevitably contain errors. Although the effects of both unstructured and structured binary mask errors have been investigated with normal-hearing (NH) listeners, they have not been investigated with CI recipients. This study assesses these effects with CI recipients using masks that have been generated systematically with a statistical model. The results demonstrate that clustering of mask errors substantially decreases the tolerance of errors, that incorrectly removing target-dominated regions can be as detrimental to intelligibility as incorrectly adding interferer-dominated regions, and that the individual tolerances of the different types of errors can change when both are present. These trends follow those of NH listeners. However, analysis with a mixed effects model suggests that CI recipients tend to be less tolerant than NH listeners to mask errors in most conditions, at least with respect to the testing methods in each of the studies. This study clearly demonstrates that structure influences the tolerance of errors and therefore should be considered when analyzing binary-masking algorithms.11 page(s

    Estimated and ideal time-frequency masks.

    No full text
    <p>Masks for an CLUE sentence mixed with ICRA7 noise at −5 dB SNR. The spectrograms of clean and noisy speech are shown in Figs 3a and 3b. The IRM and the IBM are shown in Figs 3c and 3e. A selection of estimated masks from system configurations are shown in Figs 3d, 3f, 3g and 3h. Misses (speech-dominated T-F units erroneously labeled as noise-dominated) and false alarms (noise-dominated T-F units erroneously labeled as speech-dominated) are shown on top of the estimated IBMs. The estimated IBM in Fig 3h was converted from the corresponding estimated IRM by applying a threshold, which was derived from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0196924#pone.0196924.e001" target="_blank">Eq (1)</a> at −5 dB SNR and using <i>β</i> = 0.5.</p

    Measured WRSs in normal-hearing listeners at −5 dB SNR in the ICRA7 noise.

    No full text
    <p>Unprocessed noisy speech served as a baseline condition. For the baseline (diamonds), sample means across subjects and 95% Student’s <i>t</i>-based confidence intervals of the mean were computed. For the system configurations, the least-squares means and 95% confidence limits of the least-squares means predictions derived from the linear mixed effect model were plotted.</p
    corecore